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Abstract 

In this paper we compare the performance characteristics of our selec- 
tion based learning algorithm for Web crawlers with the characteristics 
of the reinforcement learning algorithm. The task of the crawlers is to 
find new information on the Web. The selection algorithm, called weblog 
update, modifies the starting URL lists of our crawlers based on the found 
URLs containing new information. The reinforcement learning algorithm 
modifies the URL orderings of the crawlers based on the received rein- 
forcements for submitted documents. We performed simulations based on 
data collected from the Web. The collected portion of the Web is typical 
and exhibits scale-free small world (SFSW) structure. We have found that 
on this SFSW, the weblog update algorithm performs better than the re- 
inforcement learning algorithm. It finds the new information faster than 
the reinforcement learning algorithm and has better new information/all 
submitted documents ratio. We believe that the advantages of the selec- 
tion algorithm over reinforcement learning algorithm is due to the small 
world property of the Web. 

1 Introduction 

The largest source of information today is the World Wide Web. The estimated 
number of documents nears 10 billion. Similarly, the number of documents changing 
on a daily basis is also enormous. The ever-increasing growth of the Web presents a 
considerable challenge in finding novel information on the Web. 

In addition, properties of the Web, like scale-free small world (SFSW) structure 
^ I12| may create additional challenges. For example the direct consequence of the 
scale-free small world property is that there are numerous URLs or sets of interlinked 
URLs, which have a large number of incoming links. Intelligent web crawlers can be 
easily trapped at the neighborhood of such junctions as it has been shown previously 

We have developed a novel artificial life (A-life) method with intelligent individ- 
uals, crawlers, to detect new information on a news Web site. We define A-life as 
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a population of individuals having both static structural properties, and structural 
properties which may undergo continuous changes, i.e., adaptation. Our algorithms 
are based on methods developed for different areas of artificial intelligence, such as 
evolutionary computing, artificial neural networks and reinforcement learning. All ef- 
forts were made to keep the applied algorithms as simple as possible subject to the 
constraints of the internet search. 

Evolutionary computing deals with properties that may be modified during the cre- 
ation of new individuals, called 'multiplication'. Descendants may exhibit variations 
of population, and differ in performance from the others. Individuals may also termi- 
nate. Multiplication and selection is subject to the fitness of individuals, where fitness 
is typically defined by the modeler. For a recent review on evolutionary computing, see 
7 . For reviews on related evolutionary theories and the dynamics of self-modifying 
systems see 8 41 an d pill 151 . respectively. Similar concepts have been studied in other 
evolutionary systems where organisms compete for space and resources and cooperate 
through direct interaction (see, e.g., |19| and references therein.) 

Selection, however, is a very slow process and individual adaptation may be neces- 
sary in environments subject to quick changes. The typical form of adaptive learning 
is the connectionist architecture, such as artificial neural networks. Multilayer percep- 
trons (MLPs), which are universal function approximators have been used widely in 
diverse applications. Evolutionary selection of adapting MLPs has been in the focus 
of extensive research |32l I33| . 

In a typical reinforcement learning (RL) problem the learning process |27| is mo- 
tivated by the expected value of long-term cumulated profit. A well-known example 
of reinforcement learning is the TD-Gammon program of Tesauro |29| . The author 
applied MLP function approximators for value estimation. Reinforcement learning has 
also been used in concurrent multi-robot learning, where robots had to learn to forage 
together via direct interaction |16|. Evolutionary learning has been used within the 
framework of reinforcement learning to improve decision making, i.e., the state-action 
mapping called policy |231ITgll3DllT2) . 

In this paper we present a selection based algorithm and compare it to the well- 
known reinforcement learning algorithm in terms of their efficiency and behavior. In 
our problem, fitness is not determined by us, but fitness is implicit. Fitness is jointly 
determined by the ever changing external world and by the competing individuals 
together. Selection and multiplication of individuals are based on their fitness value. 
Communication and competition among our crawlers are indirect. Only the first sub- 
mitter of a document may receive positive reinforcement. Our work is different from 
other studies using combinations of genetic, evolutionary, function approximation, and 
reinforcement learning algorithms, in that i) it does not require explicit fitness func- 
tion, ii) we do not have control over the environment, iii) collaborating individuals 
use value estimation under 'evolutionary pressure', and iv) individuals work without 
direct interaction with each other. 

We performed realistic simulations based on data collected during an 18 days long 
crawl on the Web. We have found that our selection based weblog update algorithm 
performs better in scale-free small world environment than the RL algorithm, even- 
though the reinforcement learning algorithm has been shown to be efficient in finding 
relevant information 15 2l]. We explain our results based on the different behaviors of 
the algorithms. That is, the weblog update algorithm finds the good relevant document 
sources and remains at these regions until better places are found by chance. Individu- 
als using this selection algorithm are able to quickly collect the new relevant documents 
from the already known places because they monitor these places continuously. The 
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reinforcement learning algorithm explores new territories for relevant documents and 
if it finds a good place then it collects the existing relevant documents from there. The 
continuous exploration of RL causes that it finds relevant documents slower than the 
weblog update algorithm. Also, crawlers using weblog update algorithm submit more 
different documents than crawlers using the RL algorithm. Therefore there are more 
relevant new information among documents submitted by former than latter crawlers. 

The paper is organized as follows. In Section [5] we review recent works in the 
field of Web crawling. Then we describe our algorithms and the forager architecture 
in Section [3] After that in Section [I] we present our experiment on the Web and 
the conducted simulations with the results. In Section |K| we discuss our results on 
the found different behaviors of the selection and reinforcement learning algorithms. 
Section |S| concludes our paper. 

2 Related work 

Our work concerns a realistic Web environment and search algorithms over this envi- 
ronment. We compare selective/evolutionary and reinforcement learning methods. It 
seems to us that such studies should be conducted in ever changing, buzzling, wabbling 
environments, which justifies our choice of the environment. We shall review several of 
the known search tools including those |lol I15| that our work is based upon. Readers 
familiar with search tools utilized on the Web may wish to skip this section. 

There are three main problems that have been studied in the context of crawlers. 
Rungsawang et al. |23| and references therein and Menczer |17| studied the topic 
specific crawlers. Risvik et al. |22| and references therein address research issues 
related to the exponential growth of the Web. Cho and Gracia- Molina Menczer 
|17| and Edwards et. al |B] and references therein studies the problem of different 
refresh rates of URLs (possibly as high as hourly or as low as yearly) . 

Rungsawang and Angkawattanawit |23| provide an introduction to and a broad 
overview of topic specific crawlers (see citations in the paper) . They propose to learn 
starting URLs, topic keywords and URL ordering through consecutive crawling at- 
tempts. They show that the learning of starting URLs and the use of consecutive 
crawling attempts can increase the efficiency of the crawlers. The used heuristic is 
similar to the weblog algorithm 9 , which also finds good starting URLs and period- 
ically restarts the crawling from the newly learned ones. The main limitation of this 
work is that it is incapable of addressing the freshness (i.e., modification) of already 
visited Web pages. 

Menczer |17| describes some disadvantages of current Web search engines on the 
dynamic Web, e.g., the low ratio of fresh or relevant documents. He proposes to 
complement the search engines with intelligent crawlers, or web mining agents to 
overcome those disadvantages. Search engines take static snapshots of the Web with 
relatively large time intervals between two snapshots. Intelligent web mining agents 
are different: they can find online the required recent information and may evolve 
intelligent behavior by exploiting the Web linkage and textual information. 

He introduces the InfoSpider architecture that uses genetic algorithm and reinforce- 
ment learning, also describes the MySpider implementation of it. Menczer discusses 
the difficulties of evaluating online query driven crawler agents. The main problem 
is that the whole set of relevant documents for any given query are unknown, only a 
subset of the relevant documents may be known. To solve this problem he introduces 
two new metrics that estimate the real recall and precision based on an available sub- 
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set of the relevant documents. With these metrics search engine and online crawler 
performances can be compared. Starting the My Spider agent from the 100 top pages 
of AltaVista the agent's precision is better than AltaVista's precision even during the 
first few steps of the agent. 

The fact that the MySpider agent finds relevant pages in the first few steps may 
make it deployable on users' computers. Some problems may arise from this kind of 
agent usage. First of all there are security issues, like which files or information sources 
are allowed to read and write for the agent. The run time of the agents should be 
controlled carefully because there can be many users (Google answered more than 100 
million searches per day in January-February 2001) using these agents, thus creating 
huge traffic overhead on the Internet. 

Our weblog algorithm uses local selection for finding good starting URLs for 
searches, thus not depending on any search engines. Dependence on a search en- 
gine can be a suffer limitation of most existing search agents, like MySpiders. Note 
however, that it is an easy matter to combine the present algorithm with URLs offered 
by search engines. Also our algorithm should not run on individual users's computers. 
Rather it should run for different topics near to the source of the documents in the 
given topic - e.g., may run at the actual site where relevant information is stored. 

Risvik and Michelsen |22| mention that because of the exponential growth of the 
Web there is an ever increasing need for more intelligent, (topic-)specific algorithms 
for crawling, like focused crawling and document classification. With these algorithms 
crawlers and search engines can operate more efficiently in a topically limited document 
space. The authors also state that in such vertical regions the dynamics of the Web 
pages is more homogenous. 

They overview different dimensions of web dynamics and show the arising problems 
in a search engine model. They show that the problem of rapid growth of Web and 
frequent document updates creates new challenges for developing more and more effi- 
cient Web search engines. The authors define a reference search engine model having 
three main components: (1) crawler, (2) indexer, (3) searcher. The main part of the 
paper focuses on the problems that crawlers need to overcome on the dynamic Web. 
As a possible solution the authors propose a heterogenous crawling architecture. They 
also present an extensible indexer and searcher architecture. The crawling architec- 
ture has a central distributor that knows which crawler has to crawl which part of the 
web. Special crawlers with low storage and high processing capacity are dedicated to 
web regions where content changes rapidly (like news sites). These crawlers maintain 
up-to-date information on these rapidly changing Web pages. 

The main limitation of their crawling architecture is that they must divide the web 
to be crawled into distinct portions manually before the crawling starts. A weblog like 
distributed algorithm - as suggested here - my be used in that architecture to overcome 
this limitation. 

Cho and Garcia-Molina |3j define mathematically the freshness and age of doc- 
uments of search engines. They propose the Poisson process as a model for page 
refreshment. The authors also propose various refresh policies and study their effec- 
tiveness both theoretically and on real data. They present the optimal refresh policies 
for their freshness and age metrics under the Poisson page refresh model. The authors 
show that these policies are superior to others on real data, too. 

They collected about 720000 documents from 270 sites. Although they show that 
in their database more than 20 percent of the documents are changed each day, they 
disclosed these documents from their studies. Their crawler visited the documents 
once each day for 5 months, thus can not measure the exact change rate of those 
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documents. While in our work we definitely concentrate on these frequently changing 
documents. 

The proposed refresh policies require good estimation of the refresh rate for each 
document. The estimation influences the revisit frequency while the revisit frequency 
influences the estimation. Our algorithm does not need explicit frequency estimations. 
The more valuable URLs (e.g., more frequently changing) will be visited more often 
and if a crawler does not find valuable information around an URL being in it's 
weblog then that URL finally will fall out from the weblog of the crawler. However 
frequency estimations and refresh policies can be easily integrated into the weblog 
algorithm selecting the starting URL from the weblog according to the refresh policy 
and weighting each URL in the weblog according to their change frequency estimations. 

Menczer |17| also introduces a recency metric which is 1 if all of the documents 
are recent (i.e., not changed after the last download) and goes to as downloaded 
documents are getting more and more obsolete. Trivially immediately after a few 
minutes run of an online crawler the value of this metric will be 1, while the value for 
the search engine will be lower. 

Edwards et al. [Hj present a mathematical crawler model in which the number of 
obsolete pages can be minimized with a nonlinear equation system. They solved the 
nonlinear equations with different parameter settings on realistic model data. Their 
model uses different buckets for documents having different change rates therefore does 
not need any theoretical model about the change rate of pages. The main limitations 
of this work are the following: 

• by solving the nonlinear equations the content of web pages can not be taken into 
consideration. The model can not be extended easily to (topic-) specific crawlers, 
which would be highly advantageous on the exponentially growing web 2lJ], |22|. 

El 

• the rapidly changing documents (like on news sites) are not considered to be in 
any bucket, therefore increasingly important parts of the web are disclosed from 
the searches. 

However the main conclusion of the paper is that there may exist some efficient 
strategy for incremental crawlers for reducing the number of obsolete pages without 
the need for any theoretical model about the change rate of pages. 

3 Forager architecture 

There are two different kinds of agents: the foragers and the reinforcing agent (RA). 
The fleet of foragers crawl the web and send the URLs of the selected documents to 
the reinforcing agent. The RA determines which forager should work for the RA and 
how long a forager should work. The RA sends reinforcements to the foragers based 
on the received URLs. 

We employ a fleet of foragers to study the competition among individual foragers. 
The fleet of foragers allows to distribute the load of the searching task among different 
computers. A forager has simple, limited capabilities, like limited number of starting 
URLs and a simple, content based URL ordering. The foragers compete with each 
other for finding the most relevant documents. In this way they efficiently and quickly 
collect new relevant documents without direct interaction. 

At first the basic algorithms are presented. After that the reinforcing agent and 
the foragers are detailed. 
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3.1 Algorithms 



3.1.1 Weblog algorithm and starting URL selection 

A forager periodically restarts from a URL randomly selected from the list of starting 
URLs. The sequence of visited URLs between two restarts forms a path. The starting 
URL list is formed from the START.SIZE = 10 first URLs of the weblog. In the 
weblog there are WEBLOGJ3IZE = 100 URLs with their associated weblog values 
in descending order. The weblog value of a URL estimates the expected sum of 
rewards during a path after visiting that URL. The weblog update algorithm modifies 
the weblog before a new path is started (Algorithm 0. The weblog value of a URL 
already in the weblog is modified toward the sum of rewards in the remaining part of 
the path after that URL. A new URL has the value of actual sum of rewards in the 
remaining part of the path. If a URL has a high weblog value it means that around 
that URL there are many relevant documents. Therefore it may worth it to start a 
search from that URL. 



Algorithm 1 Weblog Update. f3 was set to 0.3 



input 

visitedURLs <— the steps of the given path 

values <— the sum of rewards for each step in the given path 
output 

starting URL list 
method 

cumValues *— cumulated sum of values in reverse order 
newURLs <— visitedU RLs not having value in weblog 
revisitedU RLs <— visitedURLs having value in weblog 
for each URL G newURLs 

weblog(URL) <— cumValues(U RL) 
endf or 

for each URL G revisitedU RLs 

weblog(URL) «- (1 - 0) weblog(URL) + 
(3 cumV alues(U RL) 
endf or 

weblog <— descending order of values in weblog 
weblog <- truncate weblog after the WEBLOGSIZE th 
element 

starting URL list <- first STARTJ3IZE elements of weblog 



Without the weblog algorithm the weblog and thus the starting URL list remains 
the same throughout the searches. The weblog algorithm is a very simple version of 
evolutionary algorithms. Here, evolution may occur at two different levels: the list of 
URLs of the forager is evolving by the reordering of the weblog. Also, a forager may 
multiply, and its weblog, or part of it may spread through inheritance. This way, the 
weblog algorithm incorporates most basic features of evolutionary algorithms. This 
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simple form shall be satisfactory to demonstrate our statements. 
3.1.2 Reinforcement Learning and URL ordering 

A forager can modify its URL ordering based on the received reinforcements of the sent 
URLs. The (immediate) profit is the difference of received rewards and penalties at any 
given step. Immediate profit is a myopic characterization of a step to a URL. Foragers 
have an adaptive continuous value estimator and follow the policy that maximizes the 
expected long term cumulated profit (LTP) instead of the immediate profit. Such 
estimators can be easily realized in neural systems |27l 1281 I24| . Policy and profit 
estimation are interlinked concepts: profit estimation determines the policy, whereas 
policy influences choices and, in turn, the expected LTP. (For a review, see |27|.) 
Here, choices are based on the greedy LTP policy: The forager visits the URL, which 
belongs to the frontier (the list of linked but not yet visited URLs, see later) and has 
the highest estimated LTP. 

In the particular simulation each forager has a k(— 50) dimensional probabilistic 
term- frequency inverse document- frequency (PrTFIDF) text classifier |10| . generated 
on a previously downloaded portion of the Geocities database. Fifty clusters were 
created by Boley's clustering algorithm 2 from the downloaded documents. The 
PrTFIDF classifiers were trained on these clusters plus an additional one, the (k+l) th , 
representing general texts from the internet. The PrTFIDF outputs were non-linearly 
mapped to the interval [-1,-1-1] by a hyperbolic-tangent function. The classifier was 
applied to reduce the texts to a small dimensional representation. The output vector 
of the classifier for the page of URL A is state(A) = (state(A)i, . . . , state(A)k)- (The 
(k+l) th output was dismissed.) This output vector is stored for each URL (Algorithm 
I2J. 



Algorithm 2 Page Information Storage 



input 

pageURLs <— URLs of pages to be stored 
output 

state <— the classifier output vectors for pages of pageU RLs 
method 

for each URL G pageURLs 

page <— text of page of URL 

state(U RL) <— classifier output vector for page 
endf or 



A linear function approximator is used for LTP estimation. It encompasses k pa- 
rameters, the weight vector weight = (weighti, . . . ,weightk). The LTP of document 
of URL A is estimated as the scalar product of state(A) and weight: value(A) = 
2i=i w eighti state(A)i. During URL ordering the URL with highest LTP estimation 
is selected. The URL ordering algorithm is shown in Algorithmic 

The weight vector of each forager is tuned by Temporal Difference Learning 26 
|35J|^]. Let us denote the current URL by URL n , the next URL to be visited by 
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Algorithm 3 URL Ordering 



input 

frontier <— the set of available URLs 
state <— the stored vector representation of the URLs 
output 

bestURL <— URL with maximum LTP value 
method 

for each URL £ frontier 

value(U RL) <— X/i=i state(URL)i weighti 
endf or 

bestURL <— URL with maximal LTP value 



URLn+i, the output of the classifier for URLj by state(URLj) and the estimated 
LTP of a URL URLj by value{U RLj) = "^2 k i=1 wegihtiState{URLj)i. Assume that 
leaving URL n to URL n +i the immediate profit is r n +i. Our estimation is perfect if 
value(U RL n ) = value(URL n +i) + r n +i- Future profits are typically discounted in 
such estimations as value(U RL n ) — -yvalue(U RL n+ i) + r n +i, where < 7 < 1. The 
error of value estimation is 

S(n, n + 1) = r n +i + ~yvalue(U RL„+i) — value(URL„). 

We used throughout the simulations 7 = 0.9. For each step URL n —* URL n +i the 
weights of the value function were tuned to decrease the error of value estimation 
based on the received immediate profit r„+i. The S(n,n + 1) estimation error was 
used to correct the parameters. The i th component of the weight vector, weighti, was 
corrected by 

Aweighti = a S(n, n + 1) state(U RL n )i 

with a = 0.1 and i = 1, . . . , k. These modified weights in a stationary environment 
would improve value estimation (see, e.g, |27| and references therein). The URL 
ordering update is given in Algorithm 2] 

Without the update algorithm the weight vector remains the same throughout the 
search. 

3.1.3 Document relevancy 

A document or page is possibly relevant for a forager if it is not older than 24 hours 
and the forager has not marked it previously. Algorithm |H| shows the procedure of 
selecting such documents. The selected documents are sent to the RA for further 
evaluation. 

3.1.4 Multiplication of a forager 

During multiplication the weblog is randomly divided into two equal sized parts (one 
for the original and one for the new forager). The parameters of the URL ordering 
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Algorithm 4 URL Ordering Update 



input 

URL n+ i <— the step for which the reinforcement is received 
U RL n <— the previous step before U RL n+ i 
fn+i *~ reinforcement for visiting URL n+ i 
output 

weight <— the updated weight vector 
method 

#(n, n + 1) <— r n+ i + jvaluefa RL n+ i) — value{U RL n ) 
weight <— weight + a 8{n, n + 1) state(U RL n ) 



Algorithm 5 Document Relevancy at a forager 



input 

pages <— the pages to be examined 
output 

relevantPages <— the selected pages 
method 

previousPages <— previously selected relevant pages 
relevantPages <— all pages from pages which are 

not older than 24 hours and 

not contained in previousPages 
previousPages <— add relevantPages to previousPages 



algorithm (the weight vector of the value estimation) are either copied or new random 
parameters are generated. If the forager has a URL ordering update algorithm then 
the parameters are copied. If the forager does not have any URL ordering update 
algorithm then new random parameters are generated, as shown in Algorithm |S| 

3.2 Reinforcing agent 

A reinforcing agent controls the "life" of foragers. It can start, stop, multiply or delete 
foragers. RA receives the URLs of documents selected by the foragers, and responds 
with reinforcements for the received URLs. The response is REWARD = 100 (a.u.) 
for a relevant document and PENALTY = — 1 (a.u.) for a not relevant document. A 
document is relevant if it is not yet seen by the reinforcing agent and it is not older 
than 24 hours. The reinforcing agent maintains the score of each forager working for 
it. Initially each forager has INIT_SCORE — 100 score. When a forager sends a 
URL to the RA, the forager's score is decreased by SCORE— — 0.05. After each 
relevant page sent by the forager, the forager's score is increased by SCORE+ — 1 
(Algorithm 0. 
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Algorithm 6 Multiplication 



input 

weblog 

weight vector of URL ordering 
output 

newWeblog 

newW eight 
method 

newWeblog «- WEBLOGSIZE/2 randomly selected 

URLs and values from weblog 
weblog <— delete newWeblog from weblog 
if forager has URL ordering update algorithm 

newW eight <— copy the weight vector of URL ordering 
else 

newW eight <— generate a new random weight vector 
endif 



When the forager's score reaches MAX SCORE — 200 and the number of foragers 
is smaller than MAX_FORAGER = 16 then the forager is multiplied. That is a new 
forager is created with the same algorithms as the original one has, but with slightly 
different parameters. When the forager's score goes below MINSCORE = and the 
number of foragers is larger than MIN _FORAGER — 2 then the forager is deleted 
(Algorithm [SJ. Note that a forager can be multiplied or deleted immediately after it 
has been stopped by the RA and before the next forager is activated. 

Foragers on the same computer are working in time slices one after each other. 
Each forager works for some amount of time determined by the RA. Then the RA 
stops that forager and starts the next one selected by the RA. The pseudo-code of the 
reinforcing agent is given in Algorithm 

3.3 Foragers 

A forager is initialized with parameters defining the URL ordering, and either with 
a weblog or with a seed of URLs (Algorithm 1101 . After its initialization a forager 
crawls in search paths, that is after a given number of steps the search restarts and 
the steps between two restarts form a path. During each path the forager takes 
MAXSTEP = 100 number of steps, i.e., selects the next URL to be visited with 
a URL ordering algorithm. At the beginning of a path a URL is selected randomly 
from the starting URL list. This list is formed from the 10 first URLs of the weblog. 
The weblog contains the possibly good starting URLs with their associated weblog 
values in descending order. The weblog algorithm modifies the weblog and so thus the 
starting URL list before a new path is started. When a forager is restarted by the RA, 
after the RA has stopped it, the forager continues from the internal state in which it 
was stopped. The pseudo code of step selection is given in Algorithm II II 

The URL ordering algorithm selects a URL to be the next step from the frontier 
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Algorithm 7 Manage Received URL 



input 

URL, forager <— received URL from forager 
output 

reinforcement to forager 

updated forager score 
method 

relevants <— relevant pages seen by the RA 

page <— get page of U RL 

decrease forager's score with SCORE— 

if page £ relevants or page date is older than 24 hours 

send PENALTY to forager 
else 

relevants <— add page to relevants 
send REWARD to forager 
increase forager's score with SCORE+ 
endif 



URL set. The selected URL is removed from the frontier and added to the visited 
URL set to avoid loops. After downloading the pages, only those URLs (linked from 
the visited URL) are added to the frontier which are not in the visited set. 

In each step the forager downloads the page of the selected URL and all of the 
pages linked from the page of selected URL. It sends the URLs of the possibly relevant 
pages to the reinforcing agent. The forager receives reinforcements on any previously 
sent but not yet reinforced URLs and calls the URL ordering update algorithm with 
the received reinforcements. The pseudo code of a forager is shown in Algorithm 1121 
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Algorithm 8 : Manage Forager 



input 

forager <— the forager to be multiplied or deleted 
output 

possibly modified list of foragers 
method 

if (forager's score > MAX. SCO RE and 

number of foragers < MAX .FORAGER) 
weblog, URLordering <— call forager's 
Multiplication, Alg. © 
forager may modify it's own weblog 
newForager <— create a new forager with the received 

weblog and U RLordering 
set the two foragers' score to INIT.SCORE 
else if (forager's score < MIN .SCORE and 
number of foragers > M IN. FORAGER) 
delete forager 
endif 



Algorithm 9 : Reinforcing Agent 



input 

seed URLs 
output 

relevants <— found relevant documents 
method 

relevants <— empty set /*set of all observed relevant pages 
initialize MIN.FORAGER foragers with the seed URLs 

set one of them to be the next 
repeat 

start next forager 

receive possibly relevant URL 

call Manage Received URL, Alg. 0with URL 

stop forager if its time period is over 

call Manage Forager, Alg. [5] with this forager 

choose next forager 
until time is over 
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Algorithm 10 Initialization of the forager 



input 

weblog or seed URLs 

URL ordering parameters 
output 

initialized forager 
method 

set path step number to MAX_STEP + 1 /*start new path 
set the weblog 

either with the input weblog 

or put the seed URLs into the weblog with weblog value 
set the URL ordering parameters in URL ordering algorithm 



Algorithm 11 URL Selection 



input 

frontier <— set of URLs available in this step 
visited *— set of visited URLs in this path 
output 

step <— selected URL to be visited next 
method 

if path step number < MAX STEP 

step <— selected URL by URL Ordering, Alg. [3] 

increase path step number 
else 

call the Weblog Update, Alg. ^to update the weblog 
step <— select a random URL from the starting URL list 
set path step number to 1 
frontier <— empty set 
visited <— empty set 
endif 



13 



Algorithm 12 Forager 



input 

frontier <— set of URLs available in the next step 

visited <— set of visited URLs in the current path 
output 

sent documents to the RA 

modified frontier and visited 

modified weblog and URL ordering weight vector 
method 

repeat 

step <— call URL Selection, Alg. ITTI 

frontier <— remove step from frontier 

visited <— add step to visited 

page <— download the page of step 

linkedU RLs <— links of page 

newURLs <— linkedU RLs which are not visited 

frontier <— add newURLs to frontier 

download pages of linkedU RLs 

call Page Information Storage, Alg. El with newURLs 
relevantP ages <— call Document Relevancy, Alg. 151 for 

all pages 

send relevantP ages to reinforcing agent 
receive reinforcements for sent but not yet reinforced pages 
call URL Ordering Update, Alg. HI with 
the received reinforcements 
until time is over 
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4 Experiments 



We conducted an 18 day long experiment on the Web to gather realistic data. We used 
the gathered data in simulations to compare the weblog update (Section 13.1,11 and 
reinforcement learning algorithms (Section l3.1.21 . In Web experiment we used a fleet 
of foragers using combination of reinforcement learning and weblog update algorithms 
to eliminate any biases on the gathered data. First we describe the experiment on the 
Web then the simulations. We analyze our results at the end of this section. 

4.1 Web 

We ran the experiment on the Web on a single personal computer with Celeron 1000 
MHz processor and 512 MB RAM. We implemented the forager architecture (described 
in Section |3J in Java programming language. 

In this experiment a fixed number of foragers were competing with each other to 
collect news at the CNN web site. The foragers were running in equal time intervals in 
a predefined order. Each forager had a 3 minute time interval and after that interval 
the forager was allowed to finish the step started before the end of the time interval. 
We deployed 8 foragers using the weblog update and the reinforcement learning based 
URL ordering update algorithms (8 WLRL foragers). We also deployed 8 other for- 
agers using the weblog update algorithm but without reinforcement learning (8 WL 
foragers). The predefined order of foragers was the following: 8 WLRL foragers were 
followed by the 8 WL foragers. 

We investigated the link structure of the gathered Web pages. As it is shown 
in Fig. 0the links have a power-law distribution (P(k) = k 1 ) with 7 = —1.3 for 
outgoing links and 7 = —2.57 for incoming links. That is the link structure has the 
scale-free property. The clustering coefficient |31| of the link structure is 0.02 and the 
diameter of the graph is 7.2893. We applied two different random permutations to 
the origin and to the endpoint of the links, keeping the edge distribution unchanged 
but randomly rewiring the links. The new graph has 0.003 clustering coefficient and 
8.2163 diameter. That is the clustering coefficient is smaller than the original value 
by an order of magnitude, but the diameter is almost the same. Therefore we can 
conclude that the links of gathered pages form small world structure. 

The data storage for simulation is a centralized component. The pages are stored 
with 2 indices (and time stamps). One index is the URL index, the other is the page 
index. Multiple pages can have the same URL index if they were downloaded from 
the same URL. The page index uniquely identifies a page content and the URL from 
where the page was download. At each page download of any foragers we stored the 
followings (with a time stamp containing the time of page download): 

1. if the page is relevant according to the RA then store "relevant" 

2. if the page is from a new URL then store the new URL with a new URL index 
and the page's state vector with a new page index 

3. if the content of the page is changed since the last download then store the 
page's state vector with a new page index but keep the URL index 

4. in both previous cases store the links of the page as links to page indices of the 
linked pages 

(a) if a linked page is from a new URL then store the new URL with a new 
URL index and the linked page's state vector with a new page index 
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Figure 1: Scale- free property of the Internet domain. Log- log scale distri- 
bution of the number of (incoming and outgoing) links of all URLs found during 
the time course of investigation. Horizontal axis: number of edges (logfc). Ver- 
tical axis: relative frequency of number of edges at different URLs (logP(fc)). 
Dots and dark line correspond to outgoing links, crosses and gray line correspond 
to incoming links. 

(b) if the content of the linked page is changed since the last check then store 
the page's state vector with a new page index but same URL index 

4.2 Simulation 

For the simulations we implemented the forager architecture in Matlab. The foragers 
were simulated as if they were running on one computer as described in the previous 
section. 

4.2.1 Simulation specification 

During simulations we used the Web pages that we gathered previously to generate a 
realistic environment (note that the links of pages point to local pages (not to pages 
on the Web) since a link was stored as a link to a local page index): 

• Simulated documents had the same state vector representation for URL ordering 
as the real pages had 

• Simulated relevant documents were the same as the relevant documents on the 
Web 

• Pages and links appeared at the same (relative) time when they were found in 
the Web experiment - using the new URL indices and their time stamps 

• Pages and links are refreshed or changed at the same relative time as the changes 
were detected in the Web experiment - using the new page indices for existing 
URL indices and their time stamps 

• Simulated time of a page download was the average download time of a real 
page during the Web experiment. 

We conducted simulations with two different kinds of foragers. The first case is 
when foragers used only the weblog update algorithm without URL ordering update 
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Table 1: Investigated parameters 



downloaded 


is the number of downloaded documents 


sent 


is the number of documents sent to the RA 


relevant 


is the number of found relevant documents 


found URLs 


is the number of found URLs 


download efficiency 


is the ratio of relevant to downloaded documents in 3 hour time 
window throughout the simulation. 


sent efficiency 


is the ratio of relevant to sent documents in 3 hour time window 
throughout the simulation. 


relative found URL 


ratio of found URLs to downloaded at the end of the simulation 


freshness 


is the ratio of the number of current found relevant documents 
and the number of all found relevant documents [HJ- A stored 
document is current, up-to-date, if its content is exactly the same 
as the content of the corresponding URL in the environment. 


age 


A stored current document has age, the age of an obsolete page 
is the time since the last refresh of the page on the Web |3] . 



(WL foragers). The second case is when foragers used only the reinforcement learn- 
ing based URL ordering update algorithm without the weblog update algorithm (RL 
foragers). Each WL forager had a different weight vector for URL value estimation - 
during multiplication the new forager got a new random weight vector. RL foragers 
had the same weblog with the first 10 URLs of the gathered pages - that is the start- 
ing URL of the Web experiment and the first 9 visited URLs during that experiment. 
In both cases initially there were 2 foragers and they were allowed to multiply until 
reaching the population of 16 foragers. The simulation for each type of foragers were 
repeated 3 times with different initial weight vectors for each forager. The variance 
of the results show that there is only a small difference between simulations using the 
same kind of foragers, even if the foragers were started with different random weight 
vectors in each simulation. 

4.2.2 Simulation measurements 

Table shows the investigated parameters during simulations. 

Parameter 'download efficiency' is relevant for the site where the foragers should 
be deployed to gather the new information while parameter 'sent efficiency' is relevant 
for the RA. Note that during simulations we are able to immediately and precisely 
calculate freshness and age values. In a real Web experiment it is impossible to calcu- 
late these values precisely, because of the time needed to download and compare the 
contents of all of the real Web pages to the stored ones. 

4.2.3 Simulation analysis 

The values in Table [5] are averaged over the 3 runs of each type of foragers. 
From Table H we can conclude the followines: 

• RL and WL foragers have similar download efficiency, i.e., the efficiencies from 
the point of view of the news site are about the same. 
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Table 2: Simulation results. The 3 rd and 5 th columns contain the standard 
deviation of the individual experiment results from the average values. 



type 


RL 


std RL 


WL 


std WL 


downloaded 


540636 


9840 


669673 


9580 


sent 


9747 


98 


6345 


385 


relevant 


2419 


45 


3107 


60 


found URLs 


31092 


1050 


33116 


3370 


download efficiency 


0.0045 


0.0001 


0.0046 


0.0001 


sent efficiency 


0.248 


0.003 


0.49 


0.031 


relative found URL 


0.058 


0.001 


0.05 


0.006 


freshness 


0.7 


0.006 


0.74 


0.011 


age (in hours) 


1.79 


0.04 


1.56 


0.08 



• WL foragers have higher sent efficiencies than RL foragers, i.e., the efficiency 
from the point of view of the RA is higher. This shows that WL foragers divide 
the search area better among each other than RL foragers. Sent efficiency would 
be 1 if none of two foragers have sent the same document to the RA. 

• RL foragers have higher relative found URL value than WL foragers. RL foragers 
explore more than WL foragers and RL found more URLs than WL foragers did 
per downloaded page. 

• WL foragers find faster the new relevant documents in the already found clusters. 
That is freshness is higher and age is lower than in the case of RL foragers. 




Figure 2: Efficiency. Horizontal axis: time in days. Vertical axis: download 
efficiency, that is the number of found relevant documents divided by number 
of downloaded documents in 3 hour time intervals. Upper figure shows RL 
foragers' efficiencies, lower figure shows WL foragers' efficiencies. For all of the 
3 simulation experiments there is a separate line. 

Fig. |21 shows other aspects of the different behaviors of RL and WL foragers. 
Download efficiency of RL foragers has more, higher, and sharper peaks than the 
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download efficiency of WL foragers has. That is WL foragers are more balanced in 
finding new relevant documents than RL foragers. The reason is that while the WL 
foragers remain in the found good clusters, the RL foragers continuously explore the 
new promising territories. The sharp peaks in the efficiency show that RL foragers 
find and recognize new good territories and then quickly collect the current relevant 
documents from there. The foragers can recognize these places by receiving more 
rewards from the RA if they send URLs from these places. 
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Days 

Figure 3: Freshness and Age. Horizontal axis: time in days. Upper vertical 
axis: freshness of found relevant documents in 3 hour time intervals. Lower 
vertical axis: age in hours of found relevant documents in 3 hour time intervals. 
Dotted lines correspond to weblog foragers, continuous lines correspond to RL 
foragers. 

The predefined order did not influence the working of foragers during the Web 
experiment. From Fig. [5] it can be seen that foragers during the 3 independent 
experiments did not have very different efficiencies. On Fig. |3] we show that the 
foragers in each run had a very similar behavior in terms of age and freshness, that is 
the values remains close to each other throughout the experiments. Also the results 
for individual runs were close to the average values in Table |2] (see the standard 
deviations). In each individual run the foragers were started with different weight 
vectors, but they reached similar efficiencies and behavior. This means that the initial 
conditions of the foragers did not influence the later behavior of them during the 
simulations. Furthermore foragers could not change their environment drastically (in 
terms of the found relevant documents) during a single 3 minute run time because of 
the short run time intervals and the fast change of environment - large number of new 
pages and often updated pages in the new site. During the Web experiment foragers 
were running in 8 WLRL, 8 WL, 8 WLRL, 8 WL, . . . temporal order. Because of the 
fact that initial conditions does not influence the long term performance of foragers 
and the fact that the foragers can not change their environment fully we can start to 
examine them after the first run of WLRL foragers. Then we got the other extreme 
order of foragers, that is the 8 WL, 8 WLRL, 8 WL, 8 WLRL, . . . temporal ordering. 
For the overall efficiency and behavior of foragers it did not really matter if WLRL or 
WL foragers run first and one could use mixed order in which after a WLRL forager 
a WL forager runs and after a WL forager a WLRL forager comes. However, for 
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higher bandwidths and for faster computers, random ordering may be needed for such 
comparisons. 

5 Discussion 

Our first conjecture is that selection is efficient on scale-free small world structures. 
Lorincz and Kokai |15| and Rennie et al. 12 1 1 showed that RL is efficient in the task 
of finding relevant information on the Web. Here we have shown experimentally that 
the weblog update algorithm, selection among starting URLs, is at least as efficient as 
the RL algorithm. The weblog update algorithm finds as many relevant documents as 
RL does if they download the same amount of pages. WL foragers in their fleet select 
more different URLs to send to the RA than RL foragers do in their fleet, therefore 
there are more relevant documents among those selected by WL foragers then among 
those selected by RL foragers. Also the freshness and age of found relevant documents 
are better for WL foragers than for RL foragers. 

For the weblog update algorithm, the selection among starting URLs has no fine 
tuning mechanism. Throughout its life a forager searches for the same kind of docu- 
ments - goes into the same 'direction' in the state space of document states - deter- 
mined by its fixed weight vector. The only adaptation allowed for a WL forager is to 
select starting URLs from the already seen URLs. The WL forager can not modify 
its ('directional') preferences according goes newly found relevant document supply, 
where relevant documents are abundant. But a WL forager finds good relevant doc- 
ument sources in its own direction and forces its search to stay at those places. By 
chance the forager can find better sources in its own direction if the search path from 
a starting URL is long enough. On Fig. [5] it is shown that the download efficiency of 
the foragers does not decrease with the multiplication of the foragers. Therefore the 
new foragers must found new and good relevant document sources quickly after their 
appearances. 

The reinforcement learning based URL ordering update algorithm is capable to 
fine tune the search of a forager by adapting the forager's weight vector. This feature 
has been shown to be crucial to adapt crawling in novel environments 1131 115| . An 
RL forager goes into the direction (in the state space of document states) where the 
estimated long term cumulated profit is the highest. Because the local environment of 
the foragers may changes rapidly during crawling, it seems desirable that foragers can 
quickly adapt to the found new relevant documents. Relevant documents may appear 
lonely, not creating a good relevant document source, or do not appear at the right 
URL by a mistake. This noise of the Web can derail the RL foragers from good regions. 
The forager may "turn" into less valuable directions, because of the fast adaptation 
capabilities of RL foragers. 

Our second conjecture is that selection fits SFSW better than RL. We have shown 
in our experiments that selection and RL have different behaviors. Selection selects 
good information sources, which are worth to revisit, and stays at those sources as 
long as better sources are not found by chance. RL explores new territories, and 
adapts to those. This adaptation can be a disadvantage when compared with the more 
rigid selection algorithm, which sticks to good places until 'provably' better places are 
discovered. Therefore WL foragers, which can not be derailed and stay in their found 
'niches' can find new relevant documents faster in such already known terrains than 
RL foragers can. That is, freshness is higher and age is lower for relevant documents 
found by WL foragers than for relevant documents found by RL foragers. Also, by 
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finding good sources and staying there, WL foragers divide the search task better than 
RL foragers do, this is the reason for the higher sent efficiency of WL foragers than of 
RL foragers. 

We have rewired the network as it was described in Section f4.ll This way a scale- 
free (SF) but not so small world was created. Intriguingly, in this SF structure, RL 
foragers performed better than WL ones. Clearly, further work is needed to com- 
pare the behavior of the selective and the reinforcement learning algorithms in other 
then SFSW environments. Such findings should be of relevance in the deployment of 
machine learning methods in different problem domains. 

From the practical point of view, we note that it is an easy matter to combine 
the present algorithm with URLs offered by search engines. Also, the values reported 
by the crawlers about certain environments, e.g., the environment of the URL offered 
by search engines represent the neighborhood of that URL and can serve adaptive 
filtering. This procedure is, indeed, promising to guide individual searches as it has 
been shown elsewhere |20|. 

6 Conclusion 

We presented and compared our selection algorithm to the well-known reinforcement 
learning algorithm. Our comparison was based on finding new relevant documents on 
the Web, that is in a dynamic scale-free small world environment. We have found that 
the weblog update selection algorithm performs better in this environment than the 
reinforcement learning algorithm, eventhough the reinforcement learning algorithm 
has been shown to be efficient in finding relevant information 1151 1211 . We explain our 
results based on the different behaviors of the algorithms. That is the weblog update 
algorithm finds the good relevant document sources and remains at these regions 
until better places are found by chance. Individuals using this selection algorithm 
are able to quickly collect the new relevant documents from the already known places 
because they monitor these places continuously. The reinforcement learning algorithm 
explores new territories for relevant documents and if it finds a good place then it 
collects the existing relevant documents from there. The continuous exploration and 
the fine tuning property of RL causes that RL finds relevant documents slower than 
the weblog update algorithm. 

In our future work we will study the combination of the weblog update and the 
RL algorithms. This combination uses the WL foragers ability to stay at good regions 
with the RL foragers fine tuning capability. In this way foragers will be able to go to 
new sources with the RL algorithm and monitor the already found good regions with 
the weblog update algorithm. 

We will also study the foragers in a simulated environment which is not a small 
world. The clusters of small world environment makes it easier for WL foragers to 
stay at good regions. The small diameter due to the long distance links of small world 
environment makes it easier for RL foragers to explore different regions. This work 
will measure the extent at which the different foragers rely on the small world property 
of their environment. 
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